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Abstract 

Using a collection of simulated an real benchmarks, we compare Bayesian and frequentist 
regularization approaches under a low informative constraint when the number of variables is 
almost equal to the number of observations on simulated and real datasets. This comparison 
includes new global noninformative approaches for Bayesian variable selection built on Zellner's 



g-priors that are similar to Liang et al. ( 2008 1 . The interest of those calibration- free proposals is 
discussed. The numerical experiments we present highlight the appeal of Bayesian regularization 
methods, when compared with non-Bayesian alternatives. They dominate frequentist methods 
in the sense that they provide smaller prediction errors while selecting the most relevant variables 
in a parsimonious way. 

Keywords: Model choice, regularization methods, noninformative priors, Zellner's (/-prior, 
calibration, Lasso, elastic net, Dantzig selector. 
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1 Introduction 



Given a response variable, y and a collection of p associated potential predictor variables xi, . . . , x p , 
the classical linear regression model imposes a linear dependence on the conditional expectation 
dRao] [19731 ) 

E[y\xi, . . . ,x p ] = p + $\x\ + ... (3 P x p . 

A fundamental inferential direction for those models relates to the variable selection problem, 
namely that only variables of relevance should be kept within the regression while the others 
should be removed. While we cannot discuss at length the potential applications of this perspective, 
variable selection is particularly relevant when the number p of regressors is larger than the number 
n of observations (as in microarray and other genetic data analyzes). 

To deal with poorly or ill-posed regression problems, many regularization methods have been 
proposed, like ridge regression (Hoerl and Kennard, 1970) and Lasso (Tibshirani, 1996). Recently 



the interest for frequentist regularization methods has increased and this has produced a flury of 



Lin 2007) 



methods (see, among others, Candes and Tao, 2007, Zou and Hastie 



. 2005. 


Zou, 


2006, 


Yuan and 



However, a natural approach for regularization is to follow the Bayesian paradigm as demon- 
strated recently by the Bayesian Lasso of Park and Casella (2008). The amount of literature on 



Bayesian variable selection is quite enormous (a small subset of which is, for instance, jMitchell and 


Beauchamp[ 1988, George and McCulloch 


|1993 Chipman, 1996, Smith and Kohn, 1996| [George 


and McCullochl |1997, Dupuis and Robert 


2003, Brown and Vannucci, 1998, Philips and Guttman 


1998, George||2000HKohn et al. , 20011 Nott and Green||2004HSchneider and Corcoran 2004, Casella 


and MorenoH2006[ |Cui and George 2008, 


Liang et al.[ |2008[ |Bottolo and Richardson 2010). The 



number of approaches and scenarii that have been advanced to undertake the selection of the most 
relevant variables given a set of observations is quite large, presumably due to the vague decisional 
setting induced by the question Which variables do matter? Such a variety of resolutions signals a 
lack of agreement between the actors in the field. 



Most of the solutions, including Liang et al. (2008) and Bottolo and Richardson (2010), focus 



on the use of the g- prior, introduced by Zellner (1986). While this prior has a long history and 



while it reduces the prior input to a single integer, g, the influence of this remaining prior factor 
is long-lasting and large values of g are no guarantee of negligible effects, in connection with the 
Bartlett or Lindley- Jeffreys paradoxes (Bartlett, 1957, Lindley, 1957, Robert 1993), as illustrated 



for instance in Celeux et al. (2006) or Marin and Robert (2007). In order to alleviate this influence 



some empirical Bayes Cui and George ( 


2008 


)] and hierarchical Bayes Zellner and Siow 


( 


1980 


), 


Celeux et al. 


(2006 


), Marin and Robert (2007), 


Liang et al. 


(2008 


) and Bottolo and Richardson 



(2010[)] solutions have been proposed. In this paper, we pay special attention to two calibration-free 



hierarchical Zellner g-priors. The first one is the Jeffreys prior which is not location invariant. A 
second one avoids this problem by only considering models with at least one variable in the model. 

The purpose of our paper is to compare the frequentist and the Bayesian points of views in 
regularization when n remains (slightly) greater than p, we limit our attention to full rank models. 
This comparison is considered from both the predictive and the explicative point of views. The 
outcome of this study is that Bayesian methods are quite similar while dominating their frequentist 
counterpart. 
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The plan of the paper is as follows: we recall the details of Zellner's ( 1986 ) original g-prior in 
Section [2j and discuss therein the potential choices of g. We present hierarchical noninformative 
alternatives in Section [3j Section [4] compares the results of Bayesian and frequentist methods on 
simulated and real datasets. Section [5] concludes the paper. 



2 Zellner's g-priors 

Following standard notations, we introduce a variable 7 £ T = {0, that indicates which 
variables are active in the regression, excluding the constant vector corresponding to the intercept 
that is assumed to be always present in the linear regression model. 
We observe y, xi, 



, x p £ M. n , the model A4 7 is defined as the conditional distribution 



y|X, 7 , /3^ 2 ~ A/; (X^P> 2 /„) 



(1) 



where 
► Pf 



► X 7 is the (n,p-y + 1) matrix which columns are made of the vector l n and of the variables x, 
for which ji = 1, 



are unknown parameters. 

^2 ; c 



► /3 7 £ R^ +1 and a 2 £ 

The same symbol for the parameter a 2 is used across all models. For model Zellner's g-prior 
is given by 

/^|X,7,cr 2 ~AA Pt+1 ( / 3 7 ,5 7C t 2 ((X 7 ) , X 7 )- 1 ), 

7r((7 2 |X,7) OC <7~ 2 . 

The experimenter chooses the prior expectation (3 and g-y. For such a prior, we obtain the classical 
average between prior and observed regressors, 



E(/r|X, 7 ,y) 



g^ + W 



9-r + 1 

This prior is traditionally called Zellner's g-prior in the Bayesian folklore because of the use of the 



constant g 7 by Zellner (1986) in front of Fisher's information matrix ((X^'X 7 ) . Its appeal is 
that, by using the information matrix as a global scale, 

► it avoids the specification of a whole prior covariance matrix, which would be a tremendous 
task; 

► it allows for a specification of the constant g 7 in terms of observational units, or virtual prior 



pseudo-observations in the sense of de Finetti (1972). 



However, fundamental feature of the (/-prior is that this prior is improper, due to the use of an 
infinite mass on a 2 . From a theoretical point of view, this should jeopardize the use of posterior 
model probabilities since these probabilities are not uniquely scaled under improper priors, because 



there is no way of eliminating the residual constant factor in those priors (DeGroot 1973, Kass and 

niT Tm it i~\ i^i v» 4- Vi /-\ n nnn vv\ i y"\ 11 4- l-i 4- -h— 2 i , r\ r\ t \ -y* yn f\4- at> 4~ Vi 4 a 4- T-i r\ n 



Raftery 


1995, 


Robert 


2001) 
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meaning common to all models A^ 7 , Berger et al. (1998) develop a framework that allows to work 



with a single improper prior that is common to all models (see also Marin and Robert, 2007). A 
fundamental appeal of Zellner's (/-prior in model comparison and in particular in variable selection 
is its simplicity, since it reduces the prior input to the sole specification of a scale parameter g. 



At this stage, we need to point out that an alternative g-prior is often used (Berger et al 



1998, Fernandez et al. 2001, Liang et al. 2008, Bottolo and Richardson, 2010), by singling out 



the intercept parameter in the linear regression. By first assuming a centering of the covariates, 
i.e. l^Xj = for all i's, the intercept a is given a flat prior while the other parameters of /3 7 are 
associated with a corresponding (/-prior. Thus, this is an alternative to model M-y, which we denote 
by model A4i? v to stress the distinctions between both representations and which is such that 



y|X, 7 , a, Pl v , a 2 ~ N n (al n + X£ v # nv , a 2 I n ) 



(2) 



where 



► Xj^ v the (n,p-y) matrix which columns are made of the variables Xj for which 7, = 1, 

► q£M, (3? nv G MP"* and a 1 € are unknown parameters. 

The parameters a 2 and a are denoted the same way across all models and rely on the same prior. 
Namely, for model A4™ v , the corresponding Zellner's g-pvioi is given by 



/37JX, 7 , a 2 ~ (AL 57 a 2 ((X7 nv )'X 
7r(a, c 2 |X, 7) oc a~ 2 . 



7 ) 



In that case, we obtain 



and 



E(/37 nv |X, 7 ,y) 



E(a|X, 7 ,y) = y 



g-t + 1 



1 n 



For models and in a noninformative setting, we can for instance choose (3 = P +1 



or (3- mv = Pt and large. However, as pointed out in 



Marin and Robert 



(2007, Chapter 3) among 



others, there is a lasting influence of g-y over the resulting inference and it is impossible to "let g-y 



go to infinity" to eliminate this influence, because of the Bartlett and Lindley-Jeffreys (Bartlett 



1957 



Lindley 



1957 



Robert , 1993 ) paradoxes that an infinite value of <? 7 ends up selecting the null 



model, regardless of the information brought by the data. For this reason, data-dependent versions 
of g 7 have been proposed with various degrees of justification: 



Kass and Wasserman ( 1995 ) use g-y = n so that the amount of information about the param- 



eters contained in the prior equals the amount of information brought by one observation. 



As shown by Foster and George (1994), for n large enough this perspective is very close to 



using the Schwarz (Kass and Wasserman 1995) or BIC criterion in that the log-posterior 
corresponding to g = n is equal to the penalized log-likelihood of this criterion. 
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Foster and George (1994) and George and Foster (2000) propose g 7 = pi, in connection with 



the Risk Inflation Criterion (RIG) that penalizes the regression sum of squares. 



Fernandez et al. 



(2001) gather both perspectives in g 1 



between BIG and RIG, a choice that they christened "benchmark prior" . 



max(n,p 2 ) as a conservative bridge 



George and Foster (2000) and Cui and George (2008) resort to empirical Bayes techniques 



These solutions, while commendable since based on asymptotic properties (see in particular 



Fernandez et al. 2001 for consistency results), are nonetheless unsatisfactory in that they depend 
on the sample size and involve a degree of arbitrariness. 



3 Mixtures of (/-priors 



The most natural Bayesian approach to solving the uncertainty on the parameter g-y = g is to put 
a hyperprior on this parameter: 



► This was implicitely proposed by Zellner and Siow (1980) since those authors introduced 
Cauchy priors on the /3 7 's since this corresponds to a g-prior augmented by a Gamma 
Ga(l/2,n/2) prior on g . 



Liang et al. (2008), Cui and George (2008) and 



Bottolo and Richardson 



► For model M™, 
( [2010D use 

^v|X,7,^ 2 ~^ T (0 PT)ff( 7 2 ((X7 nv )'X7 nv )- 1 ) 
and an hyperprior of the form 

Tr(a,a 2 ,g\X,-r)K(l + g)- a / 2 a- 2 , 

with a > 2 . This constraint on a is due to the fact that the hyperprior must be proper, in 
connection with the separate processing of the intercept a and the use of a Lebesgue measure 
as a prior on a. We note that a needs to be specified, a = 3 and a = 4 being the solutions 
favored by Liang et al. (2008). 



► For model M 1: Celeux et al. (2006) and Marin and Robert (2007) used 

(3^\X, 7 ,a 2 ^ M^Op^ga 2 ^)'^)- 1 ) 
and a hyperprior of the form 

n((r 2 ,g\X) oc a~ 2 g~H^(g) . 

The choice of the integer support is mostly computational, while the Jeffreys-like 1/g shape 
is not justified, but the authors claim that it is appropriate for a scale parameter. 

For model Ai-y a more convincing modelling is possible since the Jeffreys prior is available. 
Indeed, if 

(T<\X^a 2 ~Ar Py+1 (Q^ +1 ,ga 2 (pOyiC')- 1 ), 



5 



then 



y|X, 7, g, a 2 ~ M p +i O n , cr 



In 



— P7 

5 + l 



where P 7 is the orthogonal projector on the linear subspace spanned by the columns of X 7 . Since, 
the Fisher information matrix is 



3(<x 2 , 5 ) 



n/V 4 (p^ + lJyVGz+l)) 

(p 7 + l)/(a 2 ( 5 + l)) (p 7 + l)/( 5+ l)2 



the corresponding Jeffreys prior on (<r 2 , g) is 

7T(a 2 , 5 |X) OC (7- 2 (<7 + I)" 1 . 



Note that, for model .M 7 V , Liang et al. (2008 ) discuss the choice of a = 2 and then ir(a, a ,g\X., 7) oc 
<r — 2 as leading to the reference prior and Jeffreys prior, presumably also under the marginal 
model after integrating out although details are not given. 

For such a prior modelling, there exists a closed-form representation for posterior quantities in 
that 

vr(7,5|X,y) oc (g + l )n/2-(^+l)/2-l (1 + g{1 _ y 'P 7 y/ y ' y ))-"/ 2 

and 

2 Fx(n/2, 1; fa + 3)/2; y'P 7 y/y'y) 



7r(7l x >y) oc 



p 7 + 1 



(3) 



where 2-^1 is the Gaussian hypergeometric function (Butler and Wood, 2002). We can thus proceed 



to undertake Bayesian variable selection without resorting at all to numerical methods (Marin and 



in closed form as 



Robert 2007). Moreover, the shrinkage factor due to the Bayesian modelling can also be expressed 

g (g + 1 )«/2-( PT +l)/2-2 (1 + g{1 _ y 'p 7y / y ' y )r n / 2 d 5 



E( 5 /( 5 + l)|X, 7 ,y) 



poo 

/ {g + ir /2-( P , + l)/2-l {1 + g{1 _ y 'P 7 y/ y ' y ))-"/ 2 d 5 
JO 

2 2 Fi(n/2, 2; (p 7 + 3)/2 + 1; y'P^/y'y) 
(p 7 + 3) 2 Fi(n/2, 1; 7 + 3)/2; y'P 7 y/y'y) ' 

This obviously leads to straightforward representations for Bayes estimates. If X new is a q x p 
matrix containing g new; values of the explanatory variables for which we would like to predict the 
corresponding response y ne w, the Bayesian predictor of y ncw is given by 



y7 

J new 



^ [y new I X ne , v 



,x, 7 ,y] 

2 2 F!(n/2, 2; fa + 3)/2 + 1; y / P 7 y/y / y) 
fa + 3) 2 Fi (n/2, 1; (p 7 + 3)/2; y'P 7 y/y' y ; 

Similarly, the Bayesian model averaging predictor of y n ew is given by 



X ncw /3 



E [yncw|X new , 



X,y] 



(4) 



E 7£ r 2F 1 (n/2, 2; (p 7 + 3)/2 + 1; y / P 7 y/y / y)/ [fa + I) fa + 3)] 
E 7 er 2^1 (n/2, 1; (p 7 + 3)/2; y'P 7 y/y' y )/(p 7 + 1) 



X new /3 
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This numerical simplification in the derivation of Bayesian estimates and predictors is found in 



Liang et al. (2008) and exploited further in Bottolo and Richardson (2010). Note also that Guo 



and Speckman (2009) have furthermore established the consistency of the Bayes factors based on 



such priors. 

In contrast with this proposal, the prior of Liang et al. ( 2008 ) depends on a tuning parameter a. 
Despite that, there also exist arguments to support this prior modelling, including the important 
issue of invariance under location-scale transforms. As seen in the above formulae, the Jeffreys prior 
associated to model Ai-y ensure scale invariance but not location invariance. In order to ensure 
location invariance for model -M 7 , it would be necessary to center the observation variable y as well 
as the dependent variables X. Obviously, this centering of the data is completely unjustified from 
a Bayesian perspective and further it creates artificial correlations between observations. However 
it could be argued that the lack of location invariance only pertains to quite specific and somehow 
artificial situations and that it is negligible in most situations. We will return to this point in the 
comparison section. 

A location scale alternative consists in using the prior of Liang et al. ( 2008 ) with a = 2 and 
excluding the null model from the competitors. This prior leads to the model posterior probability 

2 F 1 ((n - l)/2, 1; ( Pj + 2)/2; (y - y)'P 7 (y - y)/(y - y)'(y - y)) 



7r(7|X,y) oc 



P~t 



(5) 



Equations ^ and §5§ are similar. However, in the last part of y is centered, ensuring the 
location invariance of the selection procedure. 



4 Numerical comparisons 

We present here the results of numerical experiments aiming at comparing the behavior of Bayesian 
variable selection and of some (non-Bayesian) popular regularization methods in regression, when 
considered from a variable selection point of view: The regularization methods that we consider 



are the Lasso, the Dantizg selector, and elastic net, described in Section 4.1 The Bayesian variable 
selection procedures we consider oppose strategies for selecting the hyperparameter g in Zellner's 
g-priors: We include in this comparison the intrinsic prior (Casella and Moreno, 2006) which is 



another default objective prior for the non informative setting that does not require any tuning 
parameters and is also invariant under location and scale changes. All procedure under comparison 
are described in Table [TJ We have also included in this comparison the highly standard AIC and 
BIC penalized likelihood criteria. Moreover, we will refer to the performances of an ORACLE 
procedure that assumes the true model is known and that estimate the regression coefficients with 
the least squares method. 



4.1 Regularization methods 



1) The Lasso: Introduced by Tibshirani (1996), the Lasso is a shrinkage method for linear re- 
gression. It is defined as the solution to the following i\ penalized least squares optimization 
problem 



^Lasso 

where A is a positive tuning parameter. 



argminHy-X/3111 + A^I^I, 

j'=i 
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2) The Dantzig Selector: Candes and Tao (2007) introduced the Dantzig Selector as an alter- 
native to the Lasso. The Dantzig Selector is the solution to the optimization problem 



min 11/3(1! subject to ||X*(y - Jt/3)\\oo < A, 

where A is a positive tuning parameter. The constraint ||X*(y — X/?)!!^ < A can be viewed 
as a relaxation of the normal equation in the classical linear regression. 

3) The Elastic Net (Enet): The Lasso has at least two limitations: a) Lasso does not encourage 
grouped selection in the presence of high correlated covariates and b) for the p > n case 
Lasso can select at most n covariates. To overcome these limitations, Zou and Hastie (2005) 
proposed an elastic net that combines both ridge li and Lasso l\ penalties, i.e. 

v v 
^Enet = argmin | |y - X/3||| + x ^\Pj\ + f^Yl 

3=1 3=1 

where A and \i are two positive tuning parameters. 



4.2 Numerical experiments on simulated datasets 

We have designed six different simulated datasets as benchmarks chosen as follows: 

1. Example 1 (sparse uncorrelated design) corresponds to an uncorrelated covariate setting 
(p = 0), with p = 10 predictors and where the components of Xj (i = 1, . . . , 10) are iid Wi(0, 1) 
realizations. The response is simulated as 

y ~ Af n (2 + x 2 + 2x 3 - 2x 6 - 1.5x 7 , I n ) . 



2. Example 2 (sparse correlated design) corresponds to a correlated case (p = 0.9), with 
p = 10 predictors and Xj = (z, + 3zn)/v / T0, for i = 1, 2, Xj = (zj + 3zi2)/\/l0, for i = 3, 4, 5, 
and Xj = (zj + 3zi3)/vl0 for i = 6, . . . , 10, the components of Zj (i = 1, . . . , 13) being iid 
A/"i(0, 1) realizations. The use of common terms in the Xj's obviously induces a correlation 
among those Xj's: the correlation between variables xi and x 2 is 0.9, as for the variables (X3, 
X4 and X5), and for the variables (xg, X7, x§, xg and xio). There is no correlation between 
those three groups of variables. The response is simulated as 

y ~ Af n {2 + x 2 + 2x 3 - 2x 6 - 1.5x 7 , I n ) . 



3. Example 3 (sparse noisy correlated design) involves p = 8 predictors. Those variables 
are generated using a multivariate Gaussian distribution with correlations 

p(x i ,x i ) = 0.5l i ^l. 

The response is simulated as 

y ~ A/" n (3 Xl + 1.5x 2 + 2x 5 , 9/ n ) . 



S 



4. Example 4 (saturated correlated design) is the same as Example 4, except that the 
response is simulated as 



y ~ A/"„ 0.85^ Xi ,I n 



\ i=i / 

5. Example 5 involves p = 9 predictors. Those variables are generated using a multivariate 
Gaussian distribution with correlations 

p(x i)Xi ) = 0.7l^'l. 

The response is simulated as 

y ~ W n (2x 2 - 3x 4 , I n ) ■ 

6. Example 6 (null model) involves p = 8 predictors. Those variables are generated using a 
multivariate Gaussian distribution with correlations 

p(x i ,x i ) = 0.5l < - J 'l. 

The response is simulated as 

y ~A/" n (2,4/„). 

Each dataset consists of a training set of size n = 15, on which the regression model has been 
fitted and a test set T of size ut = 200 for assessing performances. Tuning parameters in the 
Lasso, the Dantzig selector (DZ), and the elastic net (ENET) have been selected by minimizing 
the cross-validation prediction error through leave-one-out. For each example, 100 independent 
datasets have been simulated. We use three measures of performances: 

1. The root mean squared error (MSE) 



MSE y = ^/zZM-yi) 2 / n T, 

yi being the prediction of y{ in the test set; 

2. HITS: the number of correctly identified influential variables; 

3. FP (False Positives): the number of non-influential variables declared as influential. 

Using those six different datasets as benchmarks, we compare the variable selection methods 
listed in Table [T] The performances of the above selection methods are summarized in Tables [2 - 13 



In the Bayesian approaches, the set of variables is naturally selected according to the maximum 
posterior probability 7r(7|X, y) and the predictive is obtained via the Bayesian model averaging 
predictors. 



In this numerical experiment, the Bayesian procedures are clearly much more parsimonious 
than the regularization procedures in that they almost always avoid overfitting. In all examples, 
the false positive rate FP is smaller for the Bayesian solutions than for the regularization meth- 
ods. Except for the ZS-F and OVS scenarios which behave slightly worse than the others, all the 
Bayesian procedures tested here produce the same selection of predictors. It seems that ZS-F has 
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AIC 
BIC 



Akaike Information Criterion 
Bayesian Information Criterion 



BRIC 

EB-L 

EB-G 

ZS-N 

ZS-F 

ovs 

HG-3 
HG-4 



HG-2 
NIMS 



g prior with g = max(n,p ) (Fernandez et al. 2001) 



Local EB estimate of g in <?-prior (Cui and George 



Global EB estimate of g in c/-prior (Cui and George 



2008) 



Base model in Bayes factor taken as the null model (|Liang et al. 
Base model in Bayes factor taken as the full model (f 



2008) 



liang et al" 2008) 



2008) 



Objective variable selection using the intrinsic prior ( |C asella and Moreno , 2006) 



Hyper-g prior with a 
Hyper-g prior with a 



3 ( Liang et al.[ |2008) 

4 Liang et all VOOty 



Hyper-g prior with a = 2 ( [Liang et al.[ |2008[ ) , null model excluded 
Jeffreys prior on the non-invariant model 



LASSO 
DZ 

ENET 



Lasso (ITibshirani 1996) 



The Dantzig Selector (Candes and Tao 2007) 



The elastic- net (Zou and Hastie, 2005) 



Table 1: Accronyms and description for the variable selection methods compared in the numerical 
experiment. (The block separate the methods by their nature. 



a slight tendency to select too many variables. The performances of OVS are somewhat disap- 
pointing and this procedure seems to have a tendency to be too parsimonious. From a predictive 
viewpoint, computing the MSE by model averaging, Bayesian approaches also perform better than 
regularization approaches except for the saturated correlated example (Example 4). We further 
note that the classical selection procedures based on AIC and BIC do not easily reject variables 
and are thus slightly worse than Bayesian and regularization procedures (a fact not surprising for 
AIC). In all examples, the NIMS and HG-2 approaches lead to optimal performances in that they 
select the right covariates and only the right covariates, while achieving close to the minimal root 
mean squared error compared with all the other Bayesian solutions we considered. They also do 
almost systematically better than BIC and AIC. 

A global remark about this coparison is that all Bayesian procedures have a very similar MSE 
and thus that they all correspond to the same regularization effect, except for OVS which does 
systematically worse. However it is important to notice that the MSE for OVS has not been 
computed by model averaging, but by using the best model. Otherwise, it would be hazardous to 
recommend one of the priors from those simulations since there is no sensitive difference between 
them from both selection and prediction points of view. 



Translating the data Since NIMS is not location invariant, it is important to measure the 
impact of adding a constant to all observations. As stressed by a reviewer, when this constant goes 
to infinity, keeping n fixed, the last argument of 2 -Pi in ([3]) goes to one for all models. Thus if the 
empirical mean is large relative to the regression sum of squares, the data end up having little input 
in distinguishing between models. In order to measure this possible negative impact of adding a 
large constant, we replace in Example 1 y by y = y + 10 k RSS (Regression Sum of Squares) for 
k £ {1,2,3}. The results derived from NIMS criterion are summarized in Tables 14 and 15 as 



predicted, the NIMS criterion tends to choose the null model as k increases and the null model with 
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MSEy 


HITS 


FP 


ORACLE 


1.24(0.02) 


4.00(0.00) 


0.00(0.00) 


AIC 


1.75(0.08) 


3.94(0.02) 


2.78(0.17) 


BIC 


1.69(0.08) 


3.90(0.03) 


2.29(0.17) 


I ) I V 1 V 




O. i O^U.UO J 


u.oo^u.uy j 


EB-L 


1.46(0.04) 


3.80(0.04) 


0.66(0.09) 


EB-G 


1.45(0.04) 


3.78(0.04) 


0.65(0.09) 


ZS-N 


1.44(0.03) 


3.78(0.04) 


0.65(0.09) 


ZS-F 


1.49(0.03) 


3.90(0.03) 


1.73(0.14) 


OVS 


1.52(0.06) 


3.63(0.06) 


0.54(0.09) 


HG-3 


1.49(0.04) 


3.75(0.05) 


0.55(0.09) 


HG-4 


1.57(0.04) 


3.65(0.05) 


0.54(0.08) 


HG-2 


1.50(0.04) 


3.75(0.05) 


0.59(0.09) 


NIMS 


1.45(0.03) 


3.75(0.05) 


0.57(0.08) 


LASSO 


1.67(0.05) 


3.89(0.03) 


2.68(0.20) 


DZ 


1.66(0.06) 


3.72(0.07) 


2.41(0.15) 


ENET 


1.72(0.05) 


3.89(0.04) 


2.79(0.29) 



Table 2: Example 1: Mean of MSE, HITS and FP. The numbers between parentheses are the 
corresponding standard errors. 



Variables 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


AIC 


0.47 


0.95 


1.00 


0.45 


0.44 


0.99 


1.00 


0.46 


0.52 


0.44 


BIC 


0.41 


0.91 


1.00 


0.38 


0.40 


0.99 


1.00 


0.32 


0.44 


0.34 


BRIC 


0.18 


0.77 


1.00 


0.10 


0.11 


0.99 


0.99 


0.07 


0.10 


0.09 


EB-L 


0.17 


0.81 


1.00 


0.11 


0.11 


0.99 


1.00 


0.07 


0.11 


0.09 


EB-G 


0.17 


0.79 


1.00 


0.11 


0.11 


0.99 


1.00 


0.07 


0.10 


0.09 


ZS-N 


0.17 


0.79 


1.00 


0.11 


0.11 


0.99 


1.00 


0.07 


0.10 


0.09 


ZS-F 


0.34 


0.90 


1.00 


0.29 


0.33 


1.00 


1.00 


0.20 


0.33 


0.24 


OVS 


0.14 


0.72 


0.98 


0.07 


0.08 


0.97 


0.96 


0.08 


0.10 


0.07 


HG-3 


0.17 


0.77 


1.00 


0.11 


0.10 


0.99 


0.99 


0.07 


0.09 


0.08 


HG-4 


0.15 


0.77 


1.00 


0.10 


0.08 


0.99 


0.99 


0.07 


0.08 


0.07 


HG-2 


0.10 


0.83 


0.99 


0.07 


0.16 


0.98 


0.95 


0.13 


0.06 


0.07 


NIMS 


0.15 


0.77 


1.00 


0.09 


0.09 


0.99 


0.99 


0.06 


0.10 


0.08 


LASSO 


0.49 


0.91 


1.00 


0.41 


0.45 


0.98 


1.00 


0.49 


0.47 


0.37 


DZ 


0.42 


0.84 


0.96 


0.41 


0.47 


0.97 


0.95 


0.38 


0.37 


0.36 


ENET 


0.45 


0.93 


1.00 


0.45 


0.43 


0.99 


0.97 


0.52 


0.44 


0.50 



Table 3: Example 1: Relative frequencies of the selected variables for methods under comparison. 
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MSEy 


HITS 


FP 


ORACLE 


1.19(0.01) 


4.00(0.00) 


0.00(0.00) 


AIC 


1.81(0.06) 


3.12(0.08) 


2.75(0.16) 


BIC 


1.76(0.05) 


2.97(0.09) 


2.39(0.16 


RDTf 1 

l ) I \,±\_j 








EB-L 


1.45(0.02) 


2.43(0.10) 


1.03(0.10) 


EB-G 


1.45(0.02) 


2.42(0.10) 


0.95(0.10) 


ZS-N 


1.45(0.02) 


2.43(0.10) 


1.03(0.10) 


ZS-F 


1.42(0.02) 


2.97(0.08) 


2.18(0.10) 


OVS 


1.71(0.04) 


2.16(0.11) 


1.09(0.09) 


HG-3 


1.45(0.02) 


2.32(0.11) 


0.96(0.10) 


HG-4 


1.45(0.02) 


2.35(0.10) 


0.86(0.09) 


HG-2 


1.52(0.04) 


2.35(0.10) 


0.81(0.09) 


NIMS 


1.45(0.02) 


2.42(0.10) 


0.96(0.09) 


LASSO 


1.66(0.05) 


3.35(0.09) 


2.95(0.15) 


DZ 


1.59(0.03) 


2.83(0.09) 


2.23(0.10) 


ENET 


1.50(0.03) 


3.70(0.07) 


4.36(0.17) 



Table 4: Example 2: Mean of MSE, HITS and FP. The numbers between parentheses are the 
corresponding standard errors. 



Variables 



6 



10 



AIC 
BIC 



0.46 
0.41 



0.79 
0.71 



0.88 
0.86 



0.44 
0.43 



0.46 
0.33 



0.78 
0.77 



0.67 
0.63 



0.52 
0.42 



0.48 
0.45 



0.39 
0.35 



BRIC 

EB-L 

EB-G 

ZS-N 

ZS-F 

OVS 

HG-3 

HG-4 



0.21 
0.22 
0.21 
0.22 
0.40 
0.23 
0.21 
0.18 



0.60 
0.59 
0.59 
0.59 
0.72 
0.44 
0.54 
0.56 



0.80 
0.80 
0.81 
0.80 
0.84 
0.74 
0.80 
0.81 



0.17 
0.17 
0.16 
0.17 
0.37 
0.17 
0.16 
0.15 



0.13 
0.14 
0.13 
0.14 
0.31 
0.23 
0.13 
0.11 



0.65 
0.66 
0.65 
0.66 
0.79 
0.62 
0.63 
0.63 



0.39 
0.38 
0.37 
0.38 
0.62 
0.36 
0.35 
0.35 



0.18 
0.19 
0.19 
0.19 
0.38 
0.19 
0.18 
0.17 



0.18 
0.19 
0.16 
0.19 
0.41 
0.18 
0.18 
0.17 



0.12 
0.12 
0.10 
0.12 
0.31 
0.09 
0.10 
0.08 



HG-2 
NIMS 



0.22 
0.19 



0.60 
0.59 



0.78 
0.80 



0.16 
0.16 



0.13 
0.14 



0.59 
0.66 



0.42 
0.37 



0.10 
0.19 



0.15 
0.18 



0.11 
0.10 



LASSO 
DZ 

ENET 



0.47 
0.40 
0.68 



0.77 
0.65 
0.85 



0.90 
0.79 
0.97 



0.53 
0.46 
0.74 



0.40 
0.37 
0.74 



0.89 
0.76 
0.96 



0.79 
0.63 
0.92 



0.57 
0.32 
0.76 



0.55 
0.36 
0.75 



0.43 
0.32 
0.69 



Table 5: Example 2: Relative frequencies of the selected variables for methods under comparison. 
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MSEy 


HITS 


FP 


ORACLE 


3.31(0.03) 


3.00(0.00) 


0.00(0.00) 


AIC 


4.32(0.09) 


2.11(0.07) 


2.06(0.14) 


BIC 


4.24(0.08) 


1.97(0.07) 


1.68(0.14) 


I ) I V 1 V 


A C\7(C\ C\7\ 


l.DD^U.U ( J 


U.OO^U.Uo J 


EB-L 


4.06(0.06) 


1.84(0.07) 


0.79(0.09) 


EB-G 


4.07(0.07) 


1.88(0.07) 


0.83(0.09) 


ZS-N 


4.01(0.06) 


1.81(0.07) 


0.76(0.09) 


ZS-F 


4.04(0.07) 


2.10(0.07) 


1.26(0.11) 


OVS 


4.27(0.09) 


1.78(0.07) 


0.64(0.09) 


HG-3 


4.05(0.06) 


1.81(0.07) 


0.77(0.09) 


HG-4 


4.08(0.06) 


1.84(0.07) 


0.78(0.09) 


HG-2 


3.98(0.05) 


1.80(0.08) 


0.73(0.10) 


NIMS 


3.99(0.06) 


1.83(0.07) 


0.77(0.09) 


LASSO 


4.03(0.06) 


2.33(0.07) 


1.61(0.16) 


DZ 


4.32(0.10) 


2.20(0.11) 


2.06(0.16) 


ENET 


4.13(0.06) 


2.38(0.06) 


2.04(0.16) 



Table 6: Example 3: Mean of MSE, HITS and FP. The numbers between parentheses are the 
corresponding standard errors. 



Variables 


1 


2 


3 


4 


5 


6 


7 


8 


AIC 


0.89 


0.52 


0.45 


0.43 


0.70 


0.36 


0.42 


0.40 


BIC 


0.89 


0.44 


0.39 


0.36 


0.64 


0.30 


0.33 


0.30 


BRIC 


0.82 


0.35 


0.09 


0.13 


0.49 


0.12 


0.08 


0.11 


EB-L 


0.87 


0.38 


0.13 


0.19 


0.59 


0.18 


0.14 


0.15 


EB-G 


0.89 


0.39 


0.15 


0.20 


0.60 


0.18 


0.14 


0.16 


ZS-N 


0.87 


0.37 


0.13 


0.19 


0.57 


0.16 


0.13 


0.15 


ZS-F 


0.92 


0.51 


0.23 


0.34 


0.67 


0.22 


0.24 


0.23 


OVS 


0.86 


0.37 


0.12 


0.14 


0.55 


0.16 


0.08 


0.14 


HG-3 


0.87 


0.38 


0.13 


0.19 


0.56 


0.16 


0.14 


0.15 


HG-4 


0.88 


0.38 


0.13 


0.19 


0.58 


0.17 


0.14 


0.15 


HG-2 


0.80 


0.46 


0.19 


0.17 


0.60 


0.21 


0.12 


0.15 


NIMS 


0.87 


0.38 


0.12 


0.19 


0.58 


0.17 


0.14 


0.15 


LASSO 


0.96 


0.70 


0.32 


0.40 


0.67 


0.29 


0.23 


0.37 


DZ 


0.82 


0.71 


0.42 


0.47 


0.67 


0.47 


0.31 


0.39 


ENET 


0.97 


0.71 


0.49 


0.50 


0.70 


0.40 


0.30 


0.35 



Table 7: Example 3: Relative frequencies of the selected variables for methods under comparison. 
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MSEy 


HITS 


FP 


ORACLE 


1.43(0.03) 


8.00(0.00) 


0.00(0.00) 


AIC 


1.60(0.03) 


6.32(0.11) 


0.00(0.00) 


BIC 


1.64(0.03) 


5.99(0.12) 


0.00(0.00) 


I ) I V 1 V 


1 7Q(C\ C\A\ 
i. * y ^u.u4t j 






EB-L 


1.75(0.04) 


4.39(0.10) 


0.00(0.00) 


EB-G 


1.76(0.04) 


4.34(0.10) 


0.00(0.00) 


ZS-N 


1.74(0.04) 


4.38(0.10) 


0.00(0.00) 


ZS-F 


1.62(0.04) 


5.37(0.10) 


0.00(0.00) 


OVS 


2.22(0.04) 


3.82(0.10) 


0.00(0.00) 


HG-3 


1.76(0.04) 


4.32(0.10) 


0.00(0.00) 


HG-4 


1.78(0.03) 


4.19(0.09) 


0.00(0.00) 


HG-2 


1.77(0.04) 


4.18(0.11) 


0.00(0.00) 


NIMS 


1.75(0.04) 


4.39(0.10) 


0.00(0.00) 


LASSO 


1.59(0.04) 


7.13(0.12) 


0.00(0.00) 


DZ 


1.56(0.03) 


6.82(0.11) 


0.00(0.00) 


ENET 


1.54(0.03) 


7.53(0.08) 


0.00(0.00) 



Table 8: Example 4: Mean of MSE, HITS and FP. The numbers between parentheses are the 
corresponding standard errors. 



Variables 


1 


2 


3 


4 


5 


6 


7 


8 


AIC 


0.80 


0.81 


0.78 


0.75 


0.76 


0.86 


0.77 


0.79 


BIC 


0.76 


0.76 


0.75 


0.72 


0.68 


0.83 


0.71 


0.78 


BRIC 


0.45 


0.58 


0.50 


0.65 


0.54 


0.55 


0.48 


0.60 


EB-L 


0.46 


0.57 


0.52 


0.67 


0.54 


0.54 


0.50 


0.59 


EB-G 


0.45 


0.57 


0.52 


0.66 


0.54 


0.53 


0.48 


0.59 


ZS-N 


0.46 


0.57 


0.52 


0.67 


0.54 


0.54 


0.49 


0.59 


ZS-F 


0.62 


0.69 


0.60 


0.78 


0.65 


0.67 


0.62 


0.74 


OVS 


0.38 


0.57 


0.45 


0.64 


0.40 


0.49 


0.44 


0.45 


HG-3 


0.45 


0.57 


0.51 


0.67 


0.54 


0.53 


0.48 


0.57 


HG-4 


0.44 


0.57 


0.48 


0.66 


0.51 


0.52 


0.45 


0.56 


HG-2 


0.53 


0.56 


0.50 


0.50 


0.54 


0.55 


0.53 


0.47 


NIMS 


0.46 


0.58 


0.51 


0.67 


0.54 


0.54 


0.50 


0.59 


LASSO 


0.82 


0.90 


0.96 


0.92 


0.85 


0.91 


0.87 


0.90 


DZ 


0.84 


0.85 


0.84 


0.82 


0.83 


0.91 


0.89 


0.84 


ENET 


0.89 


0.93 


0.96 


0.97 


0.96 


0.93 


0.96 


0.93 



Table 9: Example 4: Relative frequencies of the selected variables for methods under comparison. 
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MSEy 


HITS 


FP 


ORACLE 


1.07(0.09) 


2.00(0.00) 


0.00(0.00) 


AIC 


1.48(0.05) 


1.93(0.02) 


2.88(0.19) 


BIC 


1.39(0.04) 


1.94(0.02) 


2.04(0.18) 


I ) I V 1 V 






n ^nfn c\q\ 
u.ou^u.uy j 


EB-L 


1.27(0.02) 


1.93(0.02) 


0.58(0.10) 


EB-G 


1.27(0.02) 


1.93(0.02) 


0.60(0.10) 


ZS-N 


1.26(0.02) 


1.93(0.02) 


0.57(0.10) 


ZS-F 


1.33(0.03) 


1.94(0.02) 


1.84(0.14) 


OVS 


1.32(0.04) 


1.89(0.03) 


0.76(0.08) 


HG-3 


1.28(0.02) 


1.93(0.02) 


0.53(0.09) 


HG-4 


1.30(0.02) 


1.93(0.02) 


0.54(0.09) 


HG-2 


1.25(0.02) 


1.93(0.02) 


0.36(0.09) 


NIMS 


1.22(0.02) 


1.93(0.02) 


0.57(0.10) 


LASSO 


1.39(0.03) 


1.99(0.01) 


2.93(0.21) 


DZ 


1.36(0.04) 


1.91(0.03) 


2.70(0.18) 


ENET 


1.43(0.03) 


1.96(0.02) 


3.25(0.20) 



Table 10: Example 5: Mean of MSE, HITS and FP. The numbers between parentheses are the 
corresponding standard errors. 



Variables 


1 


2 


3 


4 


5 


6 


7 


8 


9 


AIC 


0.36 


0.94 


0.47 


0.99 


0.35 


0.36 


0.34 


0.53 


0.47 


BIC 


0.30 


0.94 


0.38 


1.00 


0.26 


0.24 


0.22 


0.35 


0.29 


BRIC 


0.10 


0.94 


0.09 


1.00 


0.10 


0.03 


0.05 


0.08 


0.05 


EB-L 


0.10 


0.93 


0.14 


1.00 


0.11 


0.04 


0.05 


0.08 


0.06 


EB-G 


0.11 


0.93 


0.14 


1.00 


0.11 


0.04 


0.05 


0.08 


0.07 


ZS-N 


0.10 


0.93 


0.13 


1.00 


0.11 


0.04 


0.05 


0.08 


0.06 


ZS-F 


0.29 


0.94 


0.32 


1.00 


0.23 


0.22 


0.19 


0.31 


0.28 


OVS 


0.16 


0.92 


0.10 


0.97 


0.15 


0.07 


0.09 


0.11 


0.08 


HG-3 


0.10 


0.93 


0.11 


1.00 


0.11 


0.03 


0.04 


0.08 


0.06 


HG-4 


0.10 


0.93 


0.12 


1.00 


0.11 


0.03 


0.04 


0.08 


0.06 


HG-2 


0.08 


0.95 


0.07 


1.00 


0.04 


0.03 


0.02 


0.06 


0.06 


NIMS 


0.06 


0.97 


0.10 


1.00 


0.11 


0.08 


0.05 


0.08 


0.08 


LASSO 


0.51 


0.99 


0.35 


1.00 


0.47 


0.38 


0.37 


0.41 


0.44 


DZ 


0.50 


0.93 


0.32 


0.98 


0.42 


0.45 


0.26 


0.32 


0.43 


ENET 


0.52 


0.96 


0.37 


1.00 


0.55 


0.44 


0.43 


0.50 


0.44 



Table 11: Example 5: Relative frequencies of the selected variables for methods under comparison. 
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MSEy FP 


ORACLE 


1.99(0.01) 0.00(0.00) 


AIC 
BIC 


2.80(0.07) 3.16(0.21) 
2.62(0.06) 2.24(0.19) 


BRIC 

EB-L 

EB-G 

ZS-N 

ZS-F 

OVS 

HG-3 

HG-4 


2.19(0.02) 0.59(0.11) 
2.12(0.02) 2.87(0.15) 
2.11(0.02) 1.54(0.19) 
2.26(0.02) 1.02(0.17) 
2.31(0.03) 2.51(0.17) 
2.57(0.06) 2.10(0.17) 
2.13(0.02) 2.18(0.18) 
2.10(0.01) 2.54(0.17) 


HG-2 
NIMS 


2.16(0.02) 2.17(0.15) 
2.24(0.02) 0.99(0.13) 


LASSO 
DZ 

ENET 


2.19(0.04) 1.79(0.22) 
2.57(0.05) 2.49(0.20) 
2.20(0.04) 2.23(0.23) 



Table 12: Example 6: Mean of MSE and FP. The numbers between parentheses are the corre- 
sponding standard errors. 



Variables 


1 


2 


3 


4 


5 


6 


7 


8 


AIC 


0.38 


0.36 


0.31 


0.37 


0.49 


0.42 


0.41 


0.42 


BIC 


0.26 


0.22 


0.23 


0.26 


0.31 


0.36 


0.33 


0.27 


BRIC 


0.09 


0.04 


0.07 


0.08 


0.08 


0.09 


0.09 


0.05 


EB-L 


0.37 


0.27 


0.28 


0.30 


0.43 


0.43 


0.38 


0.41 


EB-G 


0.19 


0.12 


0.16 


0.16 


0.21 


0.27 


0.25 


0.18 


ZS-N 


0.14 


0.07 


0.11 


0.10 


0.16 


0.16 


0.18 


0.10 


ZS-F 


0.29 


0.27 


0.23 


0.28 


0.41 


0.38 


0.34 


0.31 


OVS 


0.26 


0.26 


0.36 


0.23 


0.28 


0.26 


0.28 


0.17 


HG-3 


0.27 


0.21 


0.20 


0.26 


0.32 


0.35 


0.30 


0.27 


HG-4 


0.32 


0.25 


0.23 


0.29 


0.40 


0.38 


0.35 


0.32 


HG-2 


0.25 


0.19 


0.23 


0.25 


0.31 


0.35 


0.32 


0.27 


NIMS 


0.12 


0.06 


0.10 


0.11 


0.14 


0.17 


0.18 


0.11 


LASSO 


0.22 


0.17 


0.23 


0.22 


0.24 


0.25 


0.29 


0.17 


DZ 


0.23 


0.30 


0.17 


0.20 


0.30 


0.27 


0.25 


0.23 


ENET 


0.30 


0.26 


0.27 


0.25 


0.28 


0.28 


0.33 


0.26 



Table 13: Example 6: Relative frequencies of the selected variables for methods under comparison. 
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no variable is always selected when k = 3. Therefore some prior assumption must be made about 
the magnitude of the intercept when using NIMS. Otherwise, the criterion is over-parsimonious. If 
this is a possible case, we suggest using instead the HG-2 approach. 





MSEy HITS FP 


y = y + 10 X RSS 
y = y + 10 2 X RSS 
y = y + 10 3 X RSS 


3.41(0.03) 0.15(0.04) 0.00(0.00) 
3.59(0.03) 0.01(0.01) 0.00(0.00) 
3.59(0.02) 0.00(0.00) 0.00(0.00) 



Table 14: Example 1: Mean of MSE, HITS and FP after replacing y by y = y + 10 k RSS for 
k G {1,2,3}. The numbers between parentheses are the corresponding standard errors for the 
NIMS selection procedure. 



Variables 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


y = y + 10 x RSS 


0.00 


0.00 


0.09 


0.00 


0.00 


0.05 


0.01 


0.00 


0.00 


0.00 


y = y + 10 2 X RSS 


0.00 


0.00 


0.01 


0.00 


0.00 


0.01 


0.00 


0.00 


0.00 


0.00 


y = y + 10 3 X RSS 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 



Table 15: Example 1: Relative frequencies of the selected variables after replacing y by y = 
y + 10 k RSS for k G {1,2,3}. 



4.3 Real datasets 

Two datasets considered in this section are associated with a moderate number of variables against 
the number of observations. 



Body fat dataset The body fat dataset has been first used by Penrose et al. (1985). The 
corresponding study aims at estimating the percentage of body fat from various body circumference 
measurements observed on 252 men. The thirteen regressor variables are: 

1. age, 

2. weight (lbs), 

3. height (inches), 

4. neck circumference, 

5. chest circumference, 

6. abdomen 2 circumference, 

7. hip circumference, 

8. thigh circumference, 

9. knee circumference, 

10. ankle circumference, 

11. biceps (extended) circumference, 
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12. forearm circumference, 

13. wrist circumference. 



In order to investigate the performances of the different methods, a dataset from Penrose et al 



(1985) has been split 25 times into a training set of 151 observations and a test set of 101 ob- 
servations. Tuning parameters for the frequentist regularization methods have been chosen by 
minimizing the (ten fold) cross- validated prediction error. 

For this dataset, the Bayesian procedures we investigated are much more parsimonious than the 



standard regularization procedures, as shown in Table 16 There is no variability in the prediction 
MSE. (We stress that MSEs are computed by model averaging for the Bayesian procedures.) As 
in the simulation experiment, all Bayesian approaches are highly similar, except for ZS-F which 
remains more open to incorporating the last two covariates. 



Ozone data This second benchmark dataset is taken from Breiman and Friedman (1985) and 
consists in daily measurements of the maximum ozone concentration and of eight meteorological 
variables near Los Angeles. Those variables are: 

1. the daily ozone concentration (maximum one hour average, parts per million) at Upland, CA which 
is the response variable; 

2. the Vandenburg 500 millibar pressure height (m); 

3. the wind speed (mph) at Los Angeles International Airport (LAX); 

4. the humidity (percent) at LAX; 

5. the Sandburg Air Force Base temperature (F°); 

6. the inversion base height at LAX; 

7. the inversion base temperature at LAX; 

8. the Daggett Pressure gradient (mm Hg) from LAX to Daggett, CA; 

9. the visibility (miles) at LAX. 

The original Ozone database contains 366 observations, of which 203 are complete. Our study 
is made just on the complete observations. We split this dataset 25 times into a training set of 101 
observations and a test set of 102 observations. 



For this dataset, as shown by Table 19, all Bayesian approaches, as well as AIC and BIC, select 
about three variables, while the regularization methods opt for five. The MSE differences between 
all procedures are negligible. (This lack of significant differences in the MSEs is also exhibited 
through the boxplots of Figure [!}) 

5 Conclusion 



In this numerical study, we have compared Bayesian variable selection methods with regularisation 
methods in a poorly informative setting. From a variable selection point of view, it appears that the 
Bayesian methods are more parsimonious and more relevant than the regularisation methods. From 
a predictive point of view, there is no significant difference between both approaches. Regularisation 
methods could however be expected to perform better from this latter point of view since they 
minimize a cross-validated prediction error. But, owing to model averaging, efficiency, Bayesian 
methods provide competitive MSE's. 
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An additional appeal of this study is to single-out and to assess two calibration- free prior models 
(NIMS and HG-2). They both appear as valuable competitors when compared with earlier Bayesian 
approaches. However, both methods have a clear drawback (NIMIS is not location invariant and 
HG-2 excludes the null model). Nonetheless our series of examples shows that they provide an 
acceptable objective Bayesian solution for Bayesian variable selection and regularization in linear 
models. 

A limitation of this study on our objective Bayesian approach is that we do not consider large 
dimensions as in Bottolo and Richardson (2010), which require different computational tools to 
face the enormous number of potential models. This difficulty is obviously faced by all Bayesian 
solutions considered in this paper and is not an issue in terms of the validity of the prior modelling. 



Acknowledgments We are grateful to the Associate Editor and one reviewer for their much 
valuable comments and suggestions on a previous version of this paper. They greatly contributed 
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MSEj, 


Mean 




of selected variables 


AIC 


4.58(0.05) 


5.56(0.20) 


BIC 


4.60(0.05) 


4.20(0.18) 


RRTP 

I ) I \ 1 V 


41.01 ^U.UO J 


9 R/lYD 1 K\ 
Z.o4t^U. 10 ) 


EB-L 


4 52(0 05) 


3 OOfO 18) 


EB-G 


4.52(0.05) 


3.28(0.17) 


ZS-N 


4.52(0.05) 


2.96(0.18) 


ZS-F 


4.49(0.05) 


4.28(0.20) 


OVS 


4.65(0.07) 


2.96(0.18) 


HG-3 


4.54(0.05) 


3.00(0.18) 


HG-4 


4.56(0.05) 


3.24(0.17) 


HG-2 


4.50(0.05) 


2.48(0.14) 


NIMS 


4.50(0.05) 


2.44(0.14) 


LASSO 


4.54(0.05) 


8.17(0.52) 


DZ 


4.51(0.06) 


11.03(0.11) 


ENET 


4.54(0.05) 


9.04(0.56) 



Table 16: Body fat dataset: Mean of the MSE y and of the selected variables. 



Variables 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


AIC 


0.44 


0.84 


0.16 


0.64 


0.04 


1.00 


0.20 


0.16 


0.08 


0.16 


0.44 


0.80 


0.88 


BIC 


0.08 


0.84 


0.08 


0.32 


0.00 


1.00 


0.12 


0.08 


0.04 


0.00 


0.16 


0.28 


0.40 


BRIC 


0.08 


0.84 


0.08 


0.32 


0.00 


1.00 


0.12 


0.08 


0.04 


0.00 


0.16 


0.24 


0.40 


EB-L 


0.08 


0.84 


0.08 


0.32 


0.00 


1.00 


0.12 


0.08 


0.04 


0.00 


0.16 


0.28 


0.40 


EB-G 


0.08 


0.88 


0.08 


0.36 


0.00 


1.00 


0.08 


0.08 


0.04 


0.00 


0.20 


0.36 


0.40 


ZS-N 


0.08 


0.84 


0.08 


0.32 


0.00 


1.00 


0.12 


0.08 


0.04 


0.00 


0.16 


0.24 


0.40 


ZS-F 


0.20 


0.84 


0.12 


0.40 


0.00 


1.00 


0.12 


0.12 


0.08 


0.04 


0.24 


0.60 


0.68 


OVS 


0.12 


0.68 


0.08 


0.16 


0.04 


1.00 


0.08 


0.00 


0.00 


0.00 


0.04 


0.24 


0.52 


HG-3 


0.08 


0.84 


0.08 


0.32 


0.00 


1.00 


0.12 


0.08 


0.04 


0.00 


0.16 


0.28 


0.40 


HG-4 


0.08 


0.88 


0.08 


0.32 


0.00 


1.00 


0.08 


0.08 


0.04 


0.00 


0.16 


0.36 


0.40 


HG-2 


0.04 


0.88 


0.00 


0.08 


0.00 


1.00 


0.08 


0.04 


0.00 


0.04 


0.16 


0.28 


0.60 


NIMS 


0.04 


0.88 


0.04 


0.08 


0.00 


1.00 


0.04 


0.08 


0.04 


0.00 


0.04 


0.04 


0.12 


LASSO 


1.00 


0.28 


1.00 


0.88 


0.24 


1.00 


0.44 


0.52 


0.28 


0.56 


0.68 


0.84 


1.00 


DZ 


1.00 


0.80 


1.00 


0.88 


0.60 


1.00 


0.80 


0.72 


0.40 


0.88 


0.92 


0.88 


0.96 


ENET 


1.00 


0.40 


1.00 


0.80 


0.28 


1.00 


0.40 


0.64 


0.44 


0.64 


0.68 


0.84 


1.00 



Table 17: Body fat dataset: relative frequencies of selections of the variables over the 25 random 
splits+. 
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MSEj, 


Mean number 




of selected variables 


AIC 


4.79(0.05) 


3.52(0.14) 


BIC 


4.77(0.05) 


2.88(0.07) 


RRTP 

I ) I \ 1 V 


4t. ( O^U.UO J 


Z.OO^U.U / j 


EB-L 


4 78(0 05) 


2 88f0 07) 


EB-G 


4.78(0.05) 


2.92(0.05) 


ZS-N 


4.78(0.05) 


2.88(0.07) 


ZS-F 


4.77(0.05) 


3.12(0.07) 


OVS 


4.81(0.05) 


2.88(0.10) 


HG-3 


4.78(0.05) 


2.88(0.07) 


HG-4 


4.78(0.05) 


2.92(0.05) 


HG-2 


4.80(0.05) 


2.68(0.10) 


NIMS 


4.79(0.05) 


2.68(0.10) 


LASSO 


4.78(0.05) 


5.24(0.21) 


DZ 


4.80(0.05) 


5.12(0.13) 


ENET 


4.79(0.05) 


5.32(0.16) 



Table 18: Ozone dataset: Mean of the MSE„ and of the selected variables. 



Variables 


1 


2 


3 


4 


5 


6 


7 


8 


AIC 


0.20 


0.12 


0.96 


1.00 


0.56 


0.08 


0.44 


0.16 


BIC 


0.04 


0.00 


0.96 


1.00 


0.60 


0.00 


0.36 


0.04 


BRIC 


0.04 


0.00 


0.96 


1.00 


0.60 


0.00 


0.40 


0.04 


EB-L 


0.04 


0.00 


0.96 


1.00 


0.60 


0.40 


0.36 


0.04 


EB-G 


0.04 


0.00 


0.96 


1.00 


0.60 


0.00 


0.36 


0.04 


ZS-N 


0.04 


0.00 


0.96 


1.00 


0.60 


0.00 


0.36 


0.04 


ZS-F 


0.04 


0.08 


0.92 


1.00 


0.60 


0.00 


0.40 


0.08 


OVS 


0.00 


0.00 


1.00 


0.92 


0.00 


0.00 


0.80 


0.08 


HG-3 


0.04 


0.00 


0.96 


1.00 


0.60 


0.00 


0.36 


0.04 


HG-4 


0.04 


0.00 


0.96 


1.00 


0.60 


0.00 


0.36 


0.04 


HG-2 


0.04 


0.00 


0.96 


1.00 


0.60 


0.00 


0.32 


0.04 


NIMS 


0.04 


0.00 


0.96 


1.00 


0.60 


0.00 


0.32 


0.04 


LASSO 


0.00 


0.00 


1.00 


1.00 


1.00 


0.00 


1.00 


1.00 


DZ 


0.00 


0.00 


1.00 


1.00 


1.00 


0.00 


1.00 


1.00 


ENET 


0.00 


0.00 


1.00 


1.00 


1.00 


0.00 


1.00 


1.00 



Table 19: Ozone dataset: relative frequencies of selections of the variables over the 25 random 
splits. 
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Comparison of methods for 25 random splits of Bodyfat data 




BIC 



NIMS 



LASSO 



ENET 



Comparison of methods for 25 random splits of Ozone data 




BIC 



NIMS 



LASSO 



ENET 



Figure 1: Body fat and Ozone datasets: variability of the root mean squared errors over 25 random 
splits for BIC, NIMS, LASSO and ENET methods. 
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